Journal of Pathology Informatics — Latest Matching Preprints

1

SortIT - A Tool For Assessing Observer Variability And Creating Ground Truth Image Classification Datasets

Uegami, W.; Bisson, T.; Okoshi, E. N.; Costa da Silva, F. G.; Jiragawasan, C.; Zerbe, N.; Bychkov, A.; Fukuoka, J.

2026-05-29 pathology 10.64898/2026.05.28.728616 medRxiv

Top 0.1%

15.8%

Show abstract

Interobserver variability in pathological assessments is a well-recognized challenge that impacts diagnostic reliability and disease understanding. This variability exists across many subspecialties due to the subjective nature of evaluations. Artificial intelligence (AI) applied to whole slide images has potential to standardize procedures and reduce variability in pathology, but transitioning to these technologies does not guarantee improvement. Establishing reliable ground truth datasets with consensus annotations is crucial for developing robust AI solutions. We introduce SortIT, an open-source web application that facilitates systematic creation and evaluation of ground truth image tile annotations. SortIT enables multiple annotators to independently label tiles, with flexible user permission controls. Annotated data can be exported for statistical analysis of observer variation and for creating ground truth datasets from consensus tiles. We outline protocols using SortIT for several use cases: (1) mitosis segmentation in tumor regions, (2) evaluating AI solutions for prostate cancer grading by comparing to expert consensus, and (3) granuloma classification by annotating discriminative tile-level features. Key strengths of SortIT lies in its ease of deployment, making it accessible and usable for a wide range of users. Overall, SortIT provides a valuable tool to establish high-quality ground truth datasets and comprehensively assess observer variability. Critical evaluation of ground truth quality using systematic annotation methodologies is crucial for developing accurate and generalizable diagnostic AI tools. Its open-source nature facilitates community adoption and further development.

2

Leveraging Open-Source Solutions to Build a Low-Cost Digital Pathology Pipeline for Translational Research

Stenberg, J.; Gullapalli, A.; Foucar, K.; Babu, D.; Redemann, J.; Joste, N.; Foucar, C.; Gratzinger, D.; George, T.; Ohgami, R.; Gullapalli, R. R.

2026-04-27 pathology 10.64898/2026.04.25.26350240 medRxiv

Top 0.1%

15.1%

Show abstract

Digital Pathology (DP) is a fast-emerging branch of pathology focused on digitizing pathology data. A key challenge of DP usage for pathology laboratories, especially mid- to small-sized clinical labs, are the upfront costs associated with instrumentation and the logistical challenges of implementation. In the current project, we built an end-to-end DP solution using low-cost, open-source components that is user-friendly at a small scale. We repurposed readily available microscopy components in a pathology lab to assemble a fully functional DP pipeline for translational research applications. We tested multiple low-cost complementary metal-oxide semiconductor (CMOS) cameras in this project and chose a user-friendly Canon camera for image acquisition. An open-source DP server solution, OMERO v.5.6.4, was used as the image management system (IMS) to host and serve the WSIs on an Ubuntu 22.04 operating system. The server-hosted WSI images were evaluated remotely and asynchronously by multiple pathologists physically situated in Albuquerque, NM; Salt Lake City, UT; and Palo Alto, CA. Each pathologist assessed the quality of the WSI pipeline, image quality, and WSI interaction experience using a 23-question survey. Overall, the custom, low-cost WSI pipeline was noted to be a robust and user-friendly experience by the pathologists. The current DP setup is unlikely to be useful as a commercial, scalable DP pipeline for large-scale clinical applications. However, it demonstrates the feasibility of creating customized, small-scale DP solutions (at a low price point) for asynchronous translational pathology research applications. Additionally, building customized DP pipelines provides excellent educational opportunities for pathology residents to gain in-depth knowledge of the various technical elements of a DP workflow. In summary, we have established a low-cost, end-to-end WSI DP pipeline useful for spatiotemporally asynchronous translational pathology research, in an academic setting.

3

An Agentic, No Code Artificial Intelligence Workflow for Developing and Externally Validating a Thyroid Nodule Ultrasound Malignancy Classifier

Thomas, J.; Pozdeyev, N.

2026-06-26 endocrinology 10.64898/2026.06.23.26356395 medRxiv

Top 0.1%

12.9%

Show abstract

Convolutional neural networks (CNNs) can classify thyroid nodules on ultrasound, yet published models are seldom available for independent testing, require machine learning expertise to develop and deploy, and are validated mostly on papillary thyroid carcinoma. Objective. To test whether an autonomous (agentic), no code artificial intelligence (AI) agent can develop a calibrated thyroid-nodule malignancy classifier, and to validate it internally and on an external cohort spanning multiple cancer histologies. Methods. This is a retrospective, computational diagnostic study with prespecified endpoints. A no code agent (Hugging Face ML Intern) autonomously reviewed data, selected and trained the model and calibrated probabilities, using the open source TN5000 dataset (3500 training, 500 validation, and 1000 test images). The trained ResNet 18 model was externally validated on 232 nodules from the University of Colorado, including follicular, medullary, oncocytic, and follicular variant of papillary carcinomas. Results. On the internal test set, an agentic AI model achieved AUROC 0.94 (95% CI, 0.920 - 0.953), sensitivity 0.90, and specificity 0.80. On external validation, agentic AI model achieved an AUROC of 0.90 (95% CI, 0.850 - 0.936), sensitivity of 0.92, and specificity of 0.68, negative predictive value of 0.96, and positive predictive value of 0.52, exceeding the performance of a previously published classifier on the same cohort (AUROC of 0.83). Conclusions. An agentic, no code AI workflow produced a calibrated, externally validated thyroid nodule classifier, supporting accessible, reproducible, and independently testable medical AI development. Prospective validation and local recalibration are required before clinical use.

4

Interpretable morphology mapping of peripheral blood leukocytes using annotation-efficient artificial intelligence

Liu, Z.; Castillo, S. P.; Han, X.; Sun, X.; Hu, Z.; Yuan, Y.

2026-05-26 pathology 10.64898/2026.05.22.725537 medRxiv

Top 0.1%

12.3%

Show abstract

BackgroundPeripheral blood smears (PBS) review is labor-intensive, subjective, and challenging for rare or morphologically heterogeneous cell types in hematologic malignancies. Artificial intelligence (AI) offers a scalable alternative, but broader clinical translation is constrained by annotation burden and limited interpretability. MethodsWe developed an interpretable, annotation-efficient AI framework that learns leukocyte morphology through a two-stage process: label-free representation learning to construct a morphological embedding space, followed by supervised fine-tuning for cell type and morphological attribute classification. The model was trained and evaluated on 5,952 PBS images from cancer patients at MD Anderson Cancer Center, including blast cells, and 17,092 images from public sources. Active learning strategies were assessed to improve label efficiency, and interpretability was examined using saliency and embedding visualization. An interactive web application, HemoSight, was developed to support clinical review. FindingsThe framework achieved a macro-F1 score of 0{middle dot}96 for 9-way leukocyte classification on the internal test split and 0{middle dot}83 on the held-out patient cohort. Active learning substantially reduced annotation requirements, reaching peak performance with only 13{middle dot}3% of available labels and significantly improving learning efficiency across 8 of 9 cell types. The model generalized to classifying 11 leukocyte morphological attributes with a mean F1 score of 85{middle dot}8% and revealed structured morphological landscapes. Saliency maps, embedding visualizations, and the HemoSight application enabled transparent morphological inspection of model predictions, supporting confidence in model behavior and feasibility for clinical integration. InterpretationOur framework enables scalable, annotation-efficient, and interpretable modeling of leukocyte morphology, supporting the integration of AI-assisted PBS review for hematopathology workflows. FundingSeed funding from The University of Texas MD Anderson Cancer Center. Research in ContextO_ST_ABSEvidence before this studyC_ST_ABSPeripheral blood smear review is essential for diagnosing and monitoring hematologic malignancies, but manual case review is time-consuming and variable, particularly for rare or abnormal leukocyte types. Automated hematology analyzers are widely used to flag abnormal cells; however, they provide limited morphological insight and often require frequent manual correction, especially in cancer settings where disease and treatment alter cell appearance. Previous artificial intelligence approaches for leukocyte classification have shown promise, but most rely on fully supervised learning, require extensive expert annotation, focus on a limited set of cell types, and frequently exclude diagnostically important rare cells such as blasts. Interpretability is inconsistently addressed, and few studies provide tools that allow clinicians to inspect and interpret model outputs within routine workflows. Added value of this studyThis study introduces an annotation-efficient framework trained on a large collection of peripheral blood smear images, including cancer patient samples with hematopathologist-verified rare cell types such as blasts. The framework learns leukocyte morphology from unlabeled images and adapts to multiple classification tasks with minimal expert labeling. Performance is evaluated on both internal test splits and a held-out patient cohort to provide a realistic estimate of generalization. Iterative, uncertainty-guided annotation substantially reduces labeling requirements while improving learning efficiency across most leukocyte classes. Beyond cell-type classification, the framework is extended to 11 clinically relevant morphological attributes and reveals a structured morphological landscape. These capabilities are integrated into a web application, HemoSight, enabling real-time inference and transparent morphological inspection of predictions within hematopathology workflows. Implications of all the available evidenceAdvancing artificial intelligence for hematology requires methods that reduce expert labeling demands, provide interpretable outputs, and perform reliably across clinically diverse patient samples. This study shows that learning from largely unlabeled data combined with iterative expert annotation can support scalable and flexible modeling of leukocyte morphology for classification tasks. Integrating quantitative predictions and interactive visualization supports the use of artificial intelligence as an assistive tool for diagnostic peripheral blood smear review, with potential to improve efficiency, consistency, and reviewer confidence.

5

Artificial intelligence-assisted ganglion cell detection in Hirschsprung's disease: A comparative evaluation of two deep learning approaches

Wang, E.; Grenier, K.; Savadjiev, P.; Poenaru, D. D.

2026-06-12 pathology 10.64898/2026.06.11.26354826 medRxiv

Top 0.1%

12.2%

Show abstract

Background. Definitive diagnosis of Hirschsprung's disease (HD) requires pathological identification of enteric ganglion cells. This process is time-consuming and subject to inter-observer variability. Artificial intelligence (AI) tools have the potential to standardize and accelerate this workflow, but no study has determined which AI approach best serves intraoperative HD pathology diagnostics. Method. This study compared the U-Net and You Only Look Once version 26 (YOLO26) frameworks for ganglion cell detection using a single-centre retrospective dataset of 54 whole-slide images (WSIs) from rectal biopsies. WSIs were tiled into 397,731 image patches (128x128 pixels), further partitioned into training (70%), validation (15%), and testing (15%) sets. Models were evaluated on tile- and patient-level diagnostic metrics and processing latency. Results. The U-Net achieved a tile-level sensitivity of 82.9%, showing no statistically significant difference compared to YOLO26 (79.1%; p = 0.097). However, YOLO26 demonstrated a statistically significant advantage in tile-level specificity (96.1% vs. 93.9%; p < 0.001) and reduced mean inference latency (7.64 ms vs. 11.57 ms/tile). At the patient level, both models achieved 100% diagnostic sensitivity. Despite low patient-level specificity (0.0% U-Net; 11.8% YOLO26), the tissue-level diagnostic burden of false positives was 6.00% for U-Net and 3.50% for YOLO26. Conclusion. The U-Net is preferred when nominal gains in sensitivity are prioritized, while the YOLO26 is an alternative that optimizes efficiency and false positive suppression. Both models serve as robust screening filters to augment the pathologist's workflow and should be selected based on workflow requirements. Prospective validation on larger, multi-centre datasets is required before clinical implementation.

6

Vessel Spatial Analysis (VeSpA): a tool for whole slide image segmentation, morphometry, and QuPath extension.

Grion, G.; Hussain, R.; Colella, F. E.; Roufail, K.; Uccella, S.; Frapolli, R.; Matteo, C.; Mintemur, O.; Pennati, F.; Renne, S. L.

2026-06-20 pathology 10.64898/2026.06.15.732366 medRxiv

Top 0.1%

9.7%

Show abstract

Quantifying vascular architecture in histological whole slide images is needed to study tissue organisation, tumour microenvironment biology, and diseaseassociated vascular remodelling. However, vessel analysis in routine immunohistochemistry remains challenging. Available workflows are often manual, require programming expertise, or lack direct integration with digital pathology platforms. We developed VeSpA (Vessel Spatial Analysis), an open-source pipeline and QuPath extension for automated vessel segmentation and morphometric quantification in CD31-stained whole slide images. VeSpA combines configurable signal extraction, using CMYK Yellow channel extraction by default and optional DAB stain deconvolution for H-DAB images, with automatic or percentile-based thresholding, morphological refinement, contour filtering, and lumen filling to generate vessel masks from standard DAB-stained sections. The QuPath extension includes a graphical interface for selecting annotations, TMA cores, or whole images, configuring segmentation parameters, running the Python backend, and importing vessel objects directly into the QuPath hierarchy. For each detected vessel, VeSpA extracts area, major axis length, minor axis length, eccentricity, centroid, and orientation, while also appending summary measurements to parent annotations and TMA cores. Validation against independent pathologist annotations showed that VeSpA achieved segmentation performance close to inter-rater agreement and outperformed yellow channel prompt-based SAM and zero-shot YOLOv8-seg on overlap-based metrics in the tested dataset. VeSpA integrates vessel segmentation, morphometric feature extraction, and QuPath-based visualisation into a single reproducible workflow for vascular quantification in computational pathology and spatial analysis of histological tissue architecture.

7

Understanding Human AI Discrepancy in Breast Cancer TIL Assessment: A Multi-Rater and Perceptual Bias Study

Capar, A.; Aloglu, I.; Aker, F.; Ertano, M.; Mese, Y. E.; Ungor, A.; Yildiz, B. E.

2026-06-04 pathology 10.64898/2026.05.29.26354196 medRxiv

Top 0.1%

6.3%

Show abstract

Objective: Tumor-infiltrating lymphocytes (TILs) in breast cancer are one of the most important indicators of the immune response within the tumor microenvironment. They play a particularly significant prognostic and predictive role in triple-negative and HER2-positive subtypes. However, substantial inter-observer variability has been reported in TIL scoring among pathologists, which limits its reliability in clinical practice. The aim of this study was to evaluate the agreement between artificial intelligence (AI) models and pathologists in TIL scoring and to compare this agreement using different statistical approaches, thereby assessing the potential of AI integration into pathology practice. Materials and Methods: Digitized histopathological images of breast cancer cases were included in the study. Tumor regions annotated by pathologists were evaluated for both stromal TIL percentage and the proportion of stromal tumor area within each ROI, with assessments performed independently by three pathologists and two AI models. Agreement was assessed among pathologists, between pathologists and AI, and between AI models. Statistical analyses included intraclass correlation coefficient (ICC), Cohen and Fleiss kappa, correlation tests, and Bland-Altman analysis. In addition, categorical agreement was examined using different cut-off values. Results: Inter-pathologist agreement was high, with an ICC of 0.81. In contrast, the global agreement between pathologists and AI models was lower (ICC 0.41). Pairwise comparisons of pathologist-AI agreement yielded substantially lower ICC values (0.12-0.21), although this improved to 0.53 when three pathologists were assessed jointly with a single AI model. The strongest categorical agreement was observed with dichotomized TIL scores ([≤]10% vs. >10%), whereas multi-category classifications were associated with a marked reduction in kappa values. Spearman correlation coefficients between pathologists and AI models ranged from moderate to good ({rho} = 0.48-0.81). Agreement between the two AI models themselves was moderate, with an ICC of 0.64

8

Three multimodal large language models fail at clinically actionable breast pathology in three different directions

Kang, Y.-J.; Jun, S.-Y.; Kim, S.

2026-06-22 pathology 10.64898/2026.06.18.26355928 medRxiv

Top 0.1%

5.5%

Show abstract

Background. Breast cancer treatment depends on histopathological features, such as grade and receptor-defined subtype; however, specialist pathologist access is constrained when the workforce is limited. Commercial multimodal large language models (MLLMs) accept hematoxylin and eosin (H&E) image tiles through paid interfaces without local hardware or fine-tuning. However, prior pathology evaluations addressed only coarse tasks. Whether they reach treatment-determining accuracy and whether vendors agree remain unclear. Methods. We aimed to evaluate three vendor-designated flagship MLLMs (Claude Sonnet 4.6, Gemini 2.5 Pro, GPT-5.5) in 427 invasive breast cancer cases. Each case went to all three with identical H&E tiles and prompts, and the subtype was inferred in the second call. The reference was an institutional sign-out report of an immunohistochemistry-derived subtype. We calculated the concordance, sensitivity, specificity, Cohen's kappa, and pairwise McNemar and Bowker tests. Findings. Claude ranked highest by raw histologic-type concordance but lowest by kappa, classifying all 23 lobular and seven micropapillary carcinomas as invasive breast carcinoma of no special type. The models anchored the Nottingham grade to three modal grades. None of the models reliably identified human epidermal growth factor receptor 2-positive disease. The failure direction was vendor-specific: Claude and GPT-5.5 were under-detected, whereas Gemini was over-called. Twelve prompt variants (4,056 calls) did not recover sensitivity. Interpretation. No current commercial MLLM reaches deployment-ready accuracy for any treatment-determining feature of breast pathology. As each vendor fails in its own fixed direction, changing vendors alters the type of error rather than removing it; therefore, the value of these models is assistive rather than autonomous. At USD 0.20-0.50 per case, they may serve as supervised draft generators that leave the diagnosis with the pathologist.

9

Assessing Foundation Models for Computational Pathology in Endometrial Cancer

Volinsky-Fremond, S.; van den Berg, N.; Barkey Wolf, J.; Schoenpflug, L. A.; Andani, S.; Ortoft, G.; Jobsen, J. J.; Lutgens, L. C.; Powell, M. E.; Mileshkin, L. R.; Mackay, H.; Leary, A.; Razack, R. R.; de Bruyn, M.; de Boer, S. M.; Nout, R. A.; Smit, V. T.; Creutzberg, C. L.; Koelzer, V. H.; Bosse, T.; Horeweg, N.

2026-05-25 pathology 10.64898/2026.05.22.26353897 medRxiv

Top 0.1%

5.2%

Show abstract

Computational pathology leverages deep learning to extract clinically relevant information from digitized tumor slides, predicting histopathological subtypes, molecular alterations, and patient outcomes. Recent pipelines increasingly rely on foundation models trained on large pan-cancer datasets to generate generalizable features. In endometrial cancer (EC), their comparative performance for clinical diagnostic tasks remains unexplored. For the first time, this study evaluates the performance of seven state-of-the-art foundation models across morphological, molecular, and prognostic tasks using a large EC dataset of 3,293 patients from randomized trials and clinical cohorts. In addition, their performance was compared to one model (EsVIT) exclusively trained on EC. The foundation models H-OPTIMUS-0, CONCH, and VIRCHOW2, achieved the highest mean performance, but the best-performing foundation model varied by task. The top-performing foundation model outperformed the EC-specific feature extractor EsVIT across all tasks. This study highlights the superiority of foundation models over a domain-specific feature extractor in EC. Selecting the optimal foundation model for novel tasks remains challenging due to performance plateaus and limited information on the training datasets, requiring rigorous benchmarking and domain insight to reach maximum potential.

10

DIANNE: Segmentation-Free Localization of Histology Differential Attributes

Domanskyi, S.; Rubinstein, J. C.; Sheridan, T. B.; Thiesen, A.; Noorbakhsh, J.; Alcoforado Diniz, J.; Ramasamy, R.; Baker, D. S.; Sheldon, R.; Wu, Q.; Kuchel, G.; Robson, P.; Chuang, J. H.

2026-05-01 pathology 10.64898/2026.04.28.721103 medRxiv

Top 0.1%

5.2%

Show abstract

Pathologist-guided distinctions within histology and spatial omic images provide insights into health and disease, with digital pathology leveraging artificial intelligence to automate such assessments. To train computational models, current digital pathology methods rely on upfront manual annotations, which are time-consuming to generate. Pre-annotation is poorly suited to investigating novel spatial behaviors--a major need driven by advances in spatial profiling--for which annotation criteria and data needs will be uncertain. To address these challenges, we present DIANNE, a digital pathology approach for rapid training and inference of spatial differential attributes based on train-time Positive Class Mixup Augmentation. DIANNE can compute foundation model-derived segmentation-free localization of differential classifiers across whole slide H&E images within seconds on a workstation, enabling interactive investigation of spatial niches. Predictive models can be re-trained in real-time in response to patch or regional annotation changes, clarifying determinative biological attributes across slides from only a few dozen annotated patches. We demonstrate the effectiveness of DIANNE for tumor detection, artifact identification, and exploration of pancreatic, fetal membranes and kidney tissue structures. DIANNE also provides analogous capabilities for IHC, multiplex immunofluorescence, and registered spatial transcriptomic+H&E images. DIANNE is implemented in a Jupyter toolkit, enabling rapid development of high-resolution classifiers from weakly-supervised training. DIANNE provides a practical system to quantitatively understand known and novel spatial phenotypes.

11

Label-free 3D virtual histology of human formalin-fixed paraffin-embedded (FFPE) prostate needle biopsies with propagation-based phase-contrast micro-CT (PBCT)

Sugarman, A. L.; Vanselow, D. J.; Chen, G.; Clark, E.; Parkinson, D.; La Riviere, P.; Silverman, J.; Warrick, J.; Cheng, K. C. C.

2026-06-01 pathology 10.64898/2026.05.28.728215 medRxiv

Top 0.1%

3.7%

Show abstract

For over a century, the goal of estimating clinical outcome from tumor biopsies has been based on histomorphology of 2D tissue slices that represent a small fraction of collected samples. Its power derives from histologys 1) unbiased representation of cell types, 2) subcellular resolution that allows the characterization of health and disease states across cell types, and 3) multi-millimeter fields of view that allow assessment of tumor heterogeneity. Histologys dependence upon physical slices, however, limits assessment of 3-dimensional cellular volumes and tissue architecture. Here, we used propagation-based phase-contrast micro-CT (PBCT) to create 3D histological images of residual formalin-fixed, paraffin-embedded (FFPE) prostate needle biopsies. The resulting isotropic, grey-scale, 0.5 micron voxel matrices were used to explore the potential of for the 3D virtual histology to distinguish diagnostic categories including benign prostatic tissue and prostatic adenocarcinoma of Gleason patterns 3, 4, and 5. Maximum intensity projections of stacks of digital slices totaling 5 microns "slices" allowed the study of virtual sections corresponding to actual serial H&E-stained sections of tissue cut after micro-CT imaging. Like histology, our PBCT reconstructions allowed us to distinguish between non-infiltrative and undulating glands of benign prostatic tissue, infiltrative round glands of Gleason pattern 3, cribriform structures of Gleason pattern 4, and comedonecrosis of Gleason pattern 5. Unlike histology, micro-CT allowed us to further probe 3D tissue architecture in volumetric context. User-friendly exploration of sample volumes was achieved using a customized Neuroglancer multiplanar and 3D rendering interface. Sparsely trained cycleGAN produced plausible virtual H&E staining from the unstained micro-CT reconstructions. Unlike tissue-section based histology, micro-CT-based virtual histology yields nondestructive 3D characterization of cancer cell and tissue architecture, including glandular spaces, without the undersampling or cutting artifacts of histology. These findings demonstrate the feasibility of PBCT-based 3D virtual histology of prostate cancer and suggest the exploration of derived quantitative analyses of tumor properties for potential contributions to patient care.

12

An Efficient and Interpretable Learning Approach for Large-Scale Histopathology Data

Moore, C.; Gupta, V.; Neupane, S.; Tripathi, H.

2026-05-03 health informatics 10.64898/2026.04.30.26352196 medRxiv

Top 0.1%

3.3%

Show abstract

Prostate cancer (PCa) remains one of the leading causes of cancer-related mortality among men, and histopathological analysis of prostate biopsy specimens is central to diagnosis and risk stratification. Whole-slide Images (WSIs) capture rich morphological information, but their gigapixel scale and the large number of extracted tissue patches make exhaustive annotation and model training computationally expensive. Attention-based Multiple Instance Learning (MIL) has emerged as an effective weakly supervised framework for WSI analysis, enabling slide-level prediction without requiring patch-level annotations. However, training MIL models on large histopathology cohorts remains resource intensive because many extracted patches are non-informative, and some patches are often processed repeatedly during training. To address these challenges, we propose an efficient and interpretable learning framework for large-scale histopathology analysis. Our method combines a pathology-pretrained UNI encoder, a Clustering-constrained Attention Multiple instance learning-Single Branch (CLAM-SB) attention-based MIL model, and a window-based training strategy that reduces computational overhead while preserving predictive performance. The paper illustrates our proposed approach and experiments on TCGA-PRAD WSIs for the PCa patients. Processing 189,600 sampled patches across 79 WSIs with our proposed approach reduced total training time by 57.5% (20 to 8.5 hours for 5 epochs) and 41.4% (27 to 16 hours for 10 epochs), respectively, underscoring its potential as a practical and resource-efficient strategy for scalable prostate histopathology analysis.

13

Unsupervised Tissue Concepts for Explainable Sarcoma Subtype Prediction from H&E

Bisson, T.; Ingram, D.; Singh, S.; Li, A.; Flynn, S.; Wang, W.-L.; Kim, A. E.; Bridge, C. P.; Demicco, E. G.; Sorrentino, A.; Jiang, S.; Hung, Y. P.; Lazar, A. J.; Iafrate, A. J.

2026-05-20 pathology 10.64898/2026.05.15.26353333 medRxiv

Top 0.1%

3.3%

Show abstract

Soft tissue sarcomas are a rare, heterogeneous group of tumors whose diagnosis remains challenging because of overlapping morphology and limited access to sarcoma-specialized pathologists. Although pathology foundation models have shown promise in computational pathology, their clinical translation remains limited by insufficient interpretability, particularly in diagnostically complex settings such as sarcoma diagnosis. Here, we developed and evaluated an H&E-based AI framework for sarcoma subtype classification that focused on explanability. Using the CONCH v1.5 foundation model, we computed embeddings from a tissue microarray cohort of 2,545 cases spanning 19 sarcoma subtypes and trained an attention-based multiple-instance learning model that achieved a balanced accuracy of 77.38% (SD 1.88). To move explainability beyond attention-based localization, we trained a sparse autoencoder on patch-level embeddings to learn 768 recurring visual concepts. 90 high-activation concepts were reviewed by three senior pathologists and curated into morphologically meaningful and non-meaningful categories, yielding a semantic dictionary of 41 diagnostically relevant tissue concepts. We then trained a linear attention-based model on the 768-concept vectors, which retained much of the performance of the raw embedding-based ABMIL model, achieving a balanced accuracy of 73.74% (SD 1.30). When restricting the linear model to pathologist-curated morphologic concepts only, balanced accuracy further decreased to 67.04% (SD 1.27), suggesting that the residual performance gain in the full concept model was driven by inconsistent, technical, or diagnostically irrelevant concepts. Concept-level explanations of the curated linear attention-based model aligned with known sarcoma morphology, including lipogenic, myxoid, spindle-cell, pleomorphic, vascular, small round blue cell, and matrix-forming patterns, and reproduced patterns of diagnostic overlap observed in human sarcoma pathology. Together, these results show that H&E-based foundation-model representations capture meaningful diagnostic structure within the known limitations of H&E in sarcoma diagnostics, but that their clinical value depends on whether this structure can be made interpretable to pathologists. Sparse autoencoder-derived concepts can address this critical gap by converting embedding-level signal into recurring morphologic patterns that pathologists can review and name, providing the foundation to link these patterns to subtype predictions. In doing so, this approach turns concept discovery into a practical form of diagnostic explanation, while also revealing where model performance is supported by recognizable histopathology and where it relies on diagnostically irrelevant or inconsistent visual patterns.

14

DigitAb: Domain-Adaptive Cell Type Prediction Method from Light Microscopy Images

Lucarelli, N.; Winfree, S.; Sabo, A.; Barwinska, D.; Ferkowicz, M.; Bowen, W.; Singh, A.; Chen, K.; Tatke, A.; Jen, K.-Y.; Eadon, M. T.; El-Achkar, T. M.; Jain, S.; Sarder, P.

2026-05-21 pathology 10.64898/2026.05.19.726313 medRxiv

Top 0.1%

2.8%

Show abstract

Light microscopy imaging with histological stains is central to disease diagnosis and research. It is enhanced with immunostaining to reveal cellular composition and complexity linked to clinical utility and biological mechanisms. Emerging multiplex imaging technologies like Phenocycler markedly increase the coverage to capture the cellular diversity but are costly, technically demanding, and inaccessible to most clinical laboratories. We developed DigitAb, a deep learning framework that classifies cell types directly from hematoxylin and eosin (H&E) stained slides, eliminating the need for specialized assays. Using Phenocycler imaging, we generated highlZlresolution ground truths for [~]3.5 million cells from 29 human kidney samples across four multi-institutional datasets to train a semantic segmentation model for 10 cell types, achieving a balanced accuracy of 0.78. By employing an integrated adversarial domain adaptation module, we tested DigitAb on unlabeled and untested biopsy samples from kidney transplant and diabetic samples. We were able to predict several cell types just from histology images, without using any special technology or immunostains, and demonstrate high concordance with clinical gold-standard Banff schema in kidney transplant rejection, and clinical characteristics of diabetic nephropathy. Our cloudlZlbased tool, DigitAb, provides scalable, accessible, labellZlfree cellular segmentation for research and clinical pathology.

15

TopoMIL: Topology Improves Multiple Instance Learning in Diagnostic Microscopic Images

Kazeminia, S.; Dasdelen, M. F.; Rieck, B.; Marr, C.

2026-06-14 bioinformatics 10.64898/2026.06.10.731443 medRxiv

Top 0.1%

2.2%

Show abstract

Microscopic images of cells and tissues are central to disease diagnosis. In computational pathology, multiple instance learning (MIL) has emerged as a key paradigm for analyzing numerous images within a single patient sample. While the representative distribution of cells in a sample is important for diagnosis, existing MIL frameworks largely overlook it. We introduce TopoMIL, a framework that extracts the representative topological structure of the sample and integrates it into the MIL classifier. Three topological representations are assessed, each with distinct advantages and computational costs. We evaluate TopoMIL on four histopathology and cytomorphology datasets, each presenting unique challenges. Integrating the samples topological information into MIL enhances classification across average, max, attention-based, and transformer pooling, yielding AUCROC gains of 3.3%, 4.2%, 5.9%, and 0.5%, respectively, with moderate computational cost. Our work underscores the potential of TopoMIL as a scalable extension to existing morphology-based models in computational pathology.

16

Automatic deep learning-based segmentation and quantification of stented arterial cross-sections for morphometric analysis

Kraftberger, M.; Spirgath, K.; Haase, T.; Bandelin, R.; Meyer, T.; Jaitner, N.; Tzschätzsch, H.

2026-04-30 pathology 10.64898/2026.04.28.721259 medRxiv

Top 0.1%

2.1%

Show abstract

Arterial vascular diseases, such as atherosclerosis, are among the most serious global health threats. In preclinical studies, morphometric analysis of histological arterial cross-sections is considered the gold standard for assessing vascular remodeling and the effectiveness of therapeutic interventions. However, morphometric analysis is usually performed manually, which is time-consuming, subjective, and requires significant user interaction. This paper presents a fully automated, operator-independent framework for the precise morphometric analysis of stented arterial cross-sections, extending the previously developed qHisto (quantitative histology) framework for the quantification of various histological components. A neural network for the segmentation of arterial structures was trained and evaluated using 819 cross-sections. In addition, a quantitative analysis of vascular morphology, fibrin area, and lumen asymmetry was performed using 72 cross-sections from coated and uncoated balloons. The model achieved high segmentation accuracy with a median Dice similarity coefficient of 0.892-0.996. Compared to manual evaluation, the system reduces analysis time by 90%, enabling efficient processing of large datasets. Furthermore, morphometric analysis with qHisto showed significant differences between coated and uncoated balloons, e.g. regarding lumen area (AUC = 0.86) and fibrin ratio (AUC = 0.94). Our developed framework enables fully automated, comprehensive and standardized analysis of histological arterial cross-sections. This helps to reduce time-consuming, repetitive manual assessments and thus facilitates research of disease mechanisms and treatment effects in preclinical studies.

17

Interpretable machine learning for coeliac disease diagnosis: quantitative morphometry of duodenal biopsies

Bryant, R.; Romero Diaz, J.; Scott, A. G.; Sagdeo, A. A.; Jenkins, G. Z.; Richardson, R. A.; Chan, J. Y. C.; Arends, M. J.; Soilleux, E. J.; Jaeckle, F.

2026-06-03 pathology 10.64898/2026.06.02.26354731 medRxiv

Top 0.1%

2.0%

Show abstract

Background Coeliac disease affects approximately 1% of the global population and remains substantially underdiagnosed. Histopathological assessment of duodenal biopsies is the diagnostic gold standard but is subject to approximately 20% inter-observer disagreement. While machine learning approaches show promise, most prior work relies on black-box models with limited interpretability, restricting clinical adoption. Methods We present an interpretable pipeline that follows established histopathological criteria by extracting clinically meaningful morphological features from H&E-stained whole-slide images. Five sequential stages perform pre-processing, semantic segmentation of villi, crypts, intraepithelial lymphocytes (IELs) and enterocytes, crypt morphometry, villus length estimation via a novel polyline-based keypoint model, and coeliac disease classification using three quantitative features: IEL-to-enterocyte ratio, villus-to-crypt area ratio, and villus-length-to-crypt-depth ratio. Training and validation used data from four institutions; independent testing used 1,357 WSIs from two further institutions including one with a previously unseen scanner manufacturer, spanning five diagnostic categories: coeliac disease, normal mucosa, chronic inflammation, gastric metaplasia, and gastric heterotopia. Results Semantic segmentation achieved villus and crypt precision and recall of 87-90%. Villus length estimation correlated strongly with expert annotations (Pearson's r=0.85, mean relative error 13.5% post-calibration). All three morphological features significantly separated coeliac disease from all non-coeliac diagnostic groups across internal and external datasets (p<0.01 in all comparisons). On the test set the diagnostic classifier achieved accuracy 94.5%, PPV 92.9%, NPV 94.7%, and AUC 0.982. Conclusions This interpretable framework achieves strong multi-centre diagnostic performance while producing quantitative morphological outputs, villus length, crypt depth, and IEL-to-enterocyte ratios, that directly reflect established histopathological criteria, representing a meaningful step towards standardised AI-assisted coeliac disease diagnosis.

18

Can a Tissue-derived Progression Signature Accurately Predict Colorectal Cancer Stage Transitions in Blood?

Sarkar, P.; Sarkar, P.

2026-06-29 bioinformatics 10.64898/2026.06.23.734006 medRxiv

Top 0.1%

2.0%

Show abstract

Colorectal cancer (CRC) is challenging to track because its molecular changes are very complex as the disease progresses, creating significant challenges for robust biomarker discovery. In this study, we developed a machine learning framework by integrating monotonic progression and the StepMiner approach. We conducted external validation to identify reproducible, consistent transcriptomic biomarkers associated with CRC progression. Gene expression datasets were analyzed across four disease states from publicly available GEO: normal colon, adenoma, primary colorectal cancer, and metastasis. First, we identified genes with monotonic expression, then used the StepMiner approach to identify genes that act as switches between stages. A balanced 74-gene signature was used for machine-learning classification with a Random Forest. External validation showed strong performance in tissue-based datasets. However, tissue-derived signatures and plasma and blood-based datasets showed poor performance, highlighting biological differences between transcriptomic profiles. Cross-filtering between tissue-derived genes and blood expression datasets was performed, which resulted in the selection of 62 blood-compatible gene signatures. Leakage-free retraining on GSE164191 achieved a mean AUC of 0.868 with balanced precision. Functional enrichment analysis showed that these genes are highly active in cancer growth. Specifically, genes CBX3, S100A11, PDK4, NCOR1, and SOX4 demonstrated stable and reliable performance across the validation fold. Overall, our study presents a progression-aware transcriptomic framework for CRC biomarker discovery and demonstrates the importance of external validation. Additionally, we evaluate whether tissue-derived signatures can predict blood profiles. This proposed approach may help the future development of tissue-based diagnostics and minimally liquid-biopsy strategies for CRC. To ensure reproducibility, our proposed workflow was automated as a Nextflow pipeline. The tissue-derived model was deployed as an application utilizing Angular, ASP.NET Core, and Plumber (R).

19

A Multicenter Swedish Histopathology Image Dataset Of Pediatric Central Nervous System Tumors

NYMAN, P.; Tampu, I. E.; Shamikh, A.; Prochazka, G.; Blystad, i.; Basmaci, E.; Diaz de Stahl, T.; Augustsson, P.; Zielinska-Chomej, K.; Cao, D.; von Salome, J.; Ardalan, A.; Somarajan, P. R.; Ljungman, G.; Lundberg, P.; Sandgren, J.; Haj-Hosseini, N.

2026-06-16 pathology 10.64898/2026.06.15.26355523 medRxiv

Top 0.1%

1.9%

Show abstract

Refined detection methods, more detailed tumor characterization, and adequate distinction between different pediatric tumor subtypes are necessary to improve diagnosis and treatment, enable precision medicine, and advance patient prognosis. However, the application of computational approaches to pediatric brain tumors remains limited, largely due to the lack of accessible datasets. To address part of this gap, we provide whole slide images (WSIs) of hematoxylin and eosin (H&E)-stained tissue sections from all pediatric central nervous system (CNS) samples collected in Sweden between 2013 and 2023. These data represent a population-based national cohort encompassing all six pediatric oncology centers in Sweden and are available through the Swedish Childhood Tumor Biobank (BTB). The dataset includes 1,446 WSIs of sufficient image quality with confirmed CNS tumor diagnoses, derived from 537 unique subjects (562 cases). In addition, diagnosticrelevant clinical information is included. Corresponding whole-genome sequencing (WGS), wholetranscriptome sequencing (WTS), and methylation array data are available for most tumor samples through separate resources. This H&E dataset has been specifically curated to support artificial intelligence-based analyses, while also serving broader applications in medical research and education. When combined with matched molecular data, it provides a valuable resource for advancing multimodal and precision diagnostic approaches in the pediatric population. Refined detection methods, more detailed tumor mapping and adequate distinction between different subtypes of pediatric tumors are necessary to improve treatment, enable precision medicine and improve patient prognosis. Application of computational algorithms for pediatric brain tumors is very limited mainly due to the unavailability of pediatric histology brain tumor data sets. To enable the development of AI models comprehensive datasets covering a wide range of pediatric brain tumors are needed.

20

Clinical Evaluation of Automated Self-Operated Transvaginal Ultrasound for Ovarian Stimulation Monitoring

Shavit, T.; Bortoletto, P.; Szychter, J.; Mendel, S.; Corcos, Y.; Petrozza, J.; Prisant, N.

2026-06-24 sexual and reproductive health 10.64898/2026.06.21.26356181 medRxiv

Top 0.2%

1.5%

Show abstract

Objective To evaluate the feasibility, safety, patient acceptance, and preliminary clinical relevance of automated self-operated transvaginal ultrasound for ovarian stimulation monitoring. Design Prospective observational pilot study. Subjects Ten women undergoing ovarian stimulation for in vitro fertilization or fertility preservation at a single high-volume private IVF center. Exposure Participants performed investigational self-operated transvaginal ultrasound examinations immediately following standard monitoring visits. Patients inserted and stabilized the ultrasound probe while ovarian and endometrial imaging was acquired through controlled motorized probe rotation without real-time anatomical guidance. Main Outcome Measure(s) The primary outcome was feasibility, defined as the generation of evaluable imaging datasets suitable for ovarian stimulation monitoring. Secondary outcomes included bilateral ovarian visualization, procedural safety, patient-reported outcomes, follicular assessment, and agreement of endometrial thickness measurements with standard transvaginal ultrasound. Result(s) Nineteen investigational scan attempts were performed, yielding 18 evaluable datasets (94.7%). Bilateral ovarian visualization was achieved in 16 of 18 evaluable examinations (88.9%), whereas partial ovarian visualization occurred in 2 examinations (11.1%). No adverse events, adverse device effects, vaginal injury, bleeding, or infection were observed. Patient-reported outcomes demonstrated high procedural acceptability, with all participants expressing willingness to reuse the system. Compared with standard transvaginal ultrasound monitoring, investigational self-operated acquisition significantly improved overall examination experience (Wilcoxon p=0.002). Investigational imaging demonstrated clinically relevant agreement with standard transvaginal ultrasound for follicular categorization and endometrial assessment. Counts of follicles [≥]14 mm correlated strongly with mature oocyte recovery for both investigational and standard ultrasound measurements (Spearman {rho}=0.83 and {rho}=0.80, respectively). Endometrial thickness measurements also demonstrated strong correlation between modalities (Spearman {rho}=0.91). Conclusion(s) This prospective pilot study demonstrates the feasibility of automated self-operated transvaginal ultrasound during ovarian stimulation monitoring. Investigational imaging generated clinically relevant monitoring information without observed safety concerns and was associated with high patient acceptance. These findings support further investigation of patient-operated acquisition strategies and standardized imaging workflows in reproductive medicine.